Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Brak Date: May 12, 2004 11:39am
Forum: petabox Subject: Re: Virtual Disk Software?

The software which runs on the petabox is arbitrary. It could be Linux, it could be (gasp!), actually, I won't even say it.

I believe the main point of the petabox is the thought of providing a petabyte of available storage. What you do with 800 nodes and 3200 disks is up to you, really.

The current design and implementation is designed to be very space and power and cost efficient, both in deployment and ongoing operation to store X amount of data. Power and cooling are huge considerations when you are building something of this scale.

Storing data is the main purpose. The most compelling reason for the Internet Archive not virtualizing the storage across the cluster with RAID, GFS, or Lustre is a design decision grounded in what should be the mission of any archive. Availability of as much of the data for as long a period as possible. Having multiple petaboxes spread around the world with the same data is one way of ensuring preservation. Perhaps more importanly, if at some arbitrary point in the future, a single hard drive is dug out of the side of a hill where a digital archive used to stand, there is at least some hope of getting some part of the data off of the drive in a useful form.

A drive that was part of some virtualized SAN with GFS and RAID would be more or less useless to a digital archeologist. However, a drive with actual files has some hope of providing useful data, music, books or other information.

Virtualization could certainly be done with this hardware for other applications, but perhaps different decisions would be made regarding switching fabric and cpu power.

This post was modified by Brak on 2004-05-12 18:39:42